Search CORE

4 research outputs found

TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

Author: Jha Susmit
Kiourti Panagiota
Li Wenchao
Wardega Kacper
Publication venue
Publication date: 15/03/2019
Field of study

Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting

Boston University Institutional Repository (OpenBU)

Ανάλυση συναισθήματος με χρήση υβριδικών n-grams

Author: Kiourti Panagiota
Κιούρτη Παναγιώτα
Publication venue
Publication date: 01/12/2015
Field of study

DSpace at NTUA

TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

Author: Jha Susmit
Kiourti Panagiota
Li Wenchao
Wardega Kacper
Publication venue
Publication date: 28/02/2019
Field of study

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

TrojDRL: evaluation of backdoor attacks on deep reinforcement learning

Author: Jha Susmit
Kiourti Panagiota
Li Wenchao
Wardega Kacper
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We present TrojDRL, a tool for exploring and evaluating backdoor attacks on deep reinforcement learning agents. TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers different gradations of threat models. We show that untargeted attacks on state-of-the-art actor-critic algorithms can circumvent existing defenses built on the assumption of backdoors being targeted. We evaluated TrojDRL on a broad set of DRL benchmarks and showed that the attacks require only poisoning as little as 0.025% of training data. Compared with existing works of backdoor attacks on classification models, TrojDRL provides a first step towards understanding the vulnerability of DRL agents.Accepted manuscrip

Crossref

Boston University Institutional Repository (OpenBU)